Skip to content

Query: Adds ability to choose global vs local/focused statistics for FullTextScore #5582

Merged
microsoft-github-policy-service[bot] merged 8 commits into
masterfrom
users/ndeshpan/ftsLocalStatistics
Feb 6, 2026
Merged

Query: Adds ability to choose global vs local/focused statistics for FullTextScore #5582
microsoft-github-policy-service[bot] merged 8 commits into
masterfrom
users/ndeshpan/ftsLocalStatistics

Conversation

@neildsh
Copy link
Copy Markdown
Contributor

@neildsh neildsh commented Jan 30, 2026

Enabling users to choose global vs local/focused statistics for FullTextScore

Why?

Cosmos DB’s implementation of FullTextScore computes BM25 statistics (term frequency, inverse document frequency, and document length) across all documents in the container, including all physical and logical partitions.

While this provides a valid and comprehensive representation of statistics for the entire dataset, it introduces challenges for several common use cases.

In multi-tenant scenarios, it is often necessary to isolate queries to data belonging to a specific tenant, typically defined by the partition key or a component of a hierarchical partition key. This enables scoring to reflect statistics that are accurate for that tenant’s dataset, rather than for the entire container. For customers such as Veeam and Sitecore, which operate large multi-tenant containers, this is not just an optimization but a requirement. Their tenants often operate in very different domains, which can significantly change the distribution and importance of keywords and phrases. Using global statistics in these cases leads to distorted relevance rankings.

In other scenarios involving hundreds or thousands of physical partitions, computing statistics across the entire container can become both time-consuming and expensive. Customers may prefer to use statistics derived from only a subset of partitions to improve performance and reduce RU consumption. Indeed, there is precedence for this as Azure AI Search defaults to this “local” method.

What?

We propose extending the flexibility of BM25 scoring in Cosmos DB so that developers can choose between a global FullTextScore (existing behavior) or Scoped FullTextScore (statistics computed restricted to the partition key(s) used in the query). The key aspects:

For global BM25, FullTextScore retains its existing behavior and computes BM25 statistics, such as IDF and average document length, across all documents in the container regardless of any partition key filters in the query. In scoped BM25, when a query includes a partition key filter or explicitly requests scoped scoring, the engine computes these statistics only over the subset of documents within the specified partition key values. Query results are still returned only from the filtered partitions, and the resulting scores and ranking reflect relevance within that partition-specific slice of data.

How?

The user issues query like:

SELECT TOP 10 * FROM c   
WHERE c.tenantId = @tenantId   
ORDER BY RANK FullTextScore(c.text, "keywords") 

And sets a new QueryRequestOption called FullTextScoreScope which can be set to one of two values: local or global. The request option is inspected, and the query uses scoped/full stats accordingly.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Comment thread docs/query/local_statistics_for_hybrid_search.md
@adityasa
Copy link
Copy Markdown
Contributor

adityasa commented Jan 30, 2026

Is it possible to add some emulator based e2e tests?
One validation that is of interest is partition filtering based on :

  1. query filter
  2. QueryRequestOptions

In both cases, the query should honor FullTextScore Scope (local v/s global).

Comment thread Microsoft.Azure.Cosmos/src/RequestOptions/QueryRequestOptions.cs Outdated
adityasa
adityasa previously approved these changes Feb 3, 2026
Copy link
Copy Markdown
Contributor

@adityasa adityasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

sc978345
sc978345 previously approved these changes Feb 4, 2026
Comment thread Microsoft.Azure.Cosmos/src/Resource/Settings/FullTextScoreScope.cs Outdated
sboshra
sboshra previously approved these changes Feb 4, 2026
Copy link
Copy Markdown
Contributor

@sboshra sboshra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@neildsh neildsh dismissed stale reviews from sboshra, sc978345, and adityasa via 44ba1c6 February 4, 2026 21:00
@neildsh neildsh force-pushed the users/ndeshpan/ftsLocalStatistics branch from 07a86a1 to 44ba1c6 Compare February 4, 2026 21:00
Copy link
Copy Markdown
Contributor

@adityasa adityasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs QUERY

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants